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ABSTRACT 

In order to determine whether the kind of process 
underlying cloze responses is indeed a systematic and exhaustive 
search, a study was conducted exploring some corollaries to such a 
search hypothesis. It was assumed that subjects would generate 
responses representing a number of word types, that some of these 
word types would be sensible and some nonsensical, and that responses 
would be representative of the entire body of possible response 
words. Five versions of a 300-word cloze passage, every fifth word 
deleted, were administered to 390 junior-high-school students who 
were randomly assigned to one of the versions* protocols were hand 
scored and success probabilities were calculated. A correlation 
matrix among seven variables was calculated and analyzed using a 
stepwise regression program. Significant correlations were noted 
among the seven variables, with the highest correlation appearing 
between size of response body related to success probability. It was 
concluded that the general search hypothesis appeared to be sustained 
since distribution of responses was related to success probability 
and since the ratio of nonsense to sensible responses was relevant to 
that distribution. It was implied that a search process could be 
characterized as systematic in part. Tables and references are 
included. (MS) 
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Students of the proceasea underlying the acquisition of information from 
written communication have from time to time used a technique referred to 
aa the cloze procedure. So far, the cloze procedure has been used pre- 
dominately in connection with the measurement of reading achievement and 
readability formulas, Rankin <l9b5) and more recently Bickley, Bickley, 
and Ellington (1970) have summarized research dealing with the cloze 
procedure and its applications. 

A cloze task consists of a language passage In which words have been 
deleted according to some prearranged scheme. Subjects are asked to 
guess the missing words. Although various scoring techniques have been 
described (e.g., Taylor and Waldman, 5.9S9) , in most cases a right-wrong 
scoring procedure Is used where exact replacements of deleted words 
constitute correct responses. 

The question asked then is: what kind oi process underlies the 

production of cloze responses? Host popular has been the assumption that 
the organism engages In some kind of systematic and exhaustive search 
process, One bit of evidence for such a search hypothesis was provided 
by Taylor (1954). He found that the number of word types emitted by a 
sample of subjects In response to a particular deletion correlated highly 
negatively with the probability of that deletion being ''dozed" successfully. 
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Taylor, who conceived the cloze procedure (Taylor, 1953), together 
with some colleagues studied the relative latencies of semantic aphaslcs, 
stutterers, and normals for cloze items requiring unique or non-unique 
responses. (Taylor, Lore and Walkman, 1967). Unique responses were 
responses to blanks which were constrained by the bllAteral context to 
the point that only one specific word could possibly make sense. One, 
not unexpected finding of this study was that unique responses required 
shorter latencies than non~unique responses. This result would be 
predicted if one were to assume that a systematic search process underlies 
the production of closures. 

In Weaver’s (1965, p. 131) opinion, the constraints involved in the 
clozo "enable us to get a close-up view of what is occurring at particular 
points in language passage." A major issue in this context is the nature 
of what exactly is occurring when, in the midst of a decoding operation, 
a reader is forced to engage in a productive operation. 

The solution of this Issue is Important both for a theory of reading 
and of language processes in general. In normal reading, or listening 
for that matter, very little Interruption of the decoding process from the 
outside takes place. Conceivably, however, there are many instances in 
both these receptive processes where Internally stimulated productive 
behavior interrupts the decoding process per se. The degree to which 
this is true seems to deteruloe the Importance of understanding the nature 
of the cloze task for an Increased understanding of the nature of reading 
process • 

Weaver (1965, p. 130) challenges the postulation of this kind of search 
hypothesis to some extent: "It Is easy to show that exhaustive search 

procedures would be much more time consuming than any human being vould 
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afford, and the illogicalness of many of our search efforts la obvious 
Thus, Weaver considers the positioning of a logically exhaustive search 
process as only an approximation to the situation in reality. 

To date* hardly any data have been presented which throw a direct 
light upon the degree to which extent one can speak of search behavior In 
connection with cloze. In the present study* some of the corollaries to 
a search hypothesis were explored in a tentative fashion. The following 
assumptions were made: 

a) Given a cloze blank, a sample of n subjects will generate n rea- 
ponses (word tokens) which represent k word types where o<k<n. The minimal 
valuo of k occurs when no subject attempts a response; k is maximized when 
all Sb emit a different response. 

b) The k response types consist of two different kinds. When considering 
the fill-ins to a particular cloze-blank one la always struck by the fact 
that some responses just don't make sense at all; they are either syntactic- 
ally Inadmissible words or seem semantically incongruous with the context. 
These are the words which fall in the N(onsense) class; the others are 
S(enaible) responses. Theoretically at least one can assume that the k 
response types consist of k^ N-types and k 2 S-types, where k"kj+k2* 

c) The sample of responses emitted by a given sample of subjects la 
representative of a population of responses for a particular blank for 
the population of subjects from which that sample was drawn. That is, 
the researcher never has data about the corpus of words from which his 
S s supposedly sample unless he assumes that the words actually emitted 
are representative of that corpus, or if you will, population of words. 

In regard to assumption b, one further comment needs to be made. 

Consider the following sentence t 



The man his house. 

Consider now the following set of responses emitted by n ■ 10 Ss. 

bought (2), painted (3), sold (1), liked (2), was (1), embraced (1). 

If one assumes that these word types do indeed represent the corpus of 
words through which subjects search when attempting to fill In the gap, 
the following observations seem in order. (1) The first four words seem 
to belong in the S-class while the other two wovds seem to be N words. 

(2) Note that It Is very difficult to make up one's mind about which of 
the S-words is probably the correct response. The decision in regard to 
the two other words is much easier. 

While it is most likely a simplification, it seems temporarily defen- 
sible to assume that, if a systematic search process takes place, the total 
number of decisions regarding the rejection or accepting of a word as 
correct choice equals k^ + a where a is a value based on the possible per- 
mutations among k^ S- class elements. This of course begs the question in 
regard to equal attractiveness of fill S-vords. However, presumably a>k 2 « 
Furthermore, it seems reasonable to assume that given a fixed number of 
total responses ( »n ) an increase in reject-accept decisions means a 
decrease in the probability guessing the right word. 

It was mentioned above that the size of the distribution of responses 
to a particular deletion was shown to be related to the probability of a 
correct answer. The speculations above, interpreted as s corollary to e 
systematic search hypothesis seem to indicate that the distribution of N 
and S words within the k response types also might affect the success 
probability. That 1st it la hypothesized that both these parameters sre 
determinants of success probability. To state this hypothesis dlfferentlyt 
regression equation of the form p ■ a - b^k + b 2 (N/S) was postulated, there 



p ■ proportion of Ss correctly filling in the blank 
k • sample alze 

N/S ■ the ratio of N-type responses and S-type responses* 

Finally, it is recognized that whereas presence of the relationship hypo- 
thesized conceivably admits both of a logical and exhaustive search and 
of a more heuristic procedure absence of such a relation seems more damaging 
for the former. 

Procedure 

Materials * The data analyzed here were collected by administering five 
veraions of a 300 word cloze passage in which every fifth word had been 
deleted. The passage was taken from a junior high school reading text. 

The versions differed in the first and therefore in the subsequent words 
deleted* Across the five versions all 300 words appeared as blanks once* 

The passage was preceded and followed by paragraphs of respectively 140 
and 100 words long* 

Subjects . Ss were 390 junior high students, nearly equally divided over 
the 7th, 8th, and 9th grades. 

Procedure . The St were randomly assigned to one of the five cloze versions* 
The task was explained to them by means of an illustrative paragraph. They 
were then asked to "read the story and fill in the exact words which you 
think were left out." 

An alysis^ The protocols were hand scored and for each word the success 
probability (number of Sa guessing the word) was calculated. In the 
analyses presented belov only the nouns (n ■ 31) are included. For these 
51 nouns the following statistics were calculated fro* the response 
distribution of each: (l) k ( * total word types) | (2) k ( - word typea 
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in the N-class); (3) k 2 ( D word types tn the S-class); (4) p^ ■ kj^; 

(5) p 2 ° rk^/rk2 where rk^ stands for the total number of responses in the 
N-clags and rk2 for the total number of responses in the S-class; (6) p 
( « proportion of correct responses). It must be noted that the classifi- 
cation of word types as either N or S is subjective. In the majority of 
the cases, however, class ificatory judgments were rather unambiguous. 

The variables Included In the analysis are summarized in Table 1. 



Insert Table 1 about here 

In preliminary analyses additional variables were included but dropped 
because of redundancy. They were; k^/k; k^/k; rk^/m; rk 2 /m; k^/rk^; k£/rk 2 * 

A matrix of correlations among the variables was calculated and 
analyzed using the RMD 02R, Stepwise Regression program* 

Results 

Table 2 presents the covrelatlons among the 7 variables included in 
analysis. The Spearman -Brown reliabilities of the five Cloze versions 
ranged from .82 - ,91 with a .89 median* 



Insert Table 2 about here 



A few remarks arc in order* First of all, it seems clear that k, the size 
of tha corpus in terms of word types is highest related to the success 
probability. The more word types emitted the smaller the probability of 
success for a specific closure. This result simply confirms Taylor's 
findings in this respect mentioned above. Taylor found a -.87 rank order 
correlation between p and an information statistic calculated on the basis 
ERIC of the number of word typer emitted and the frequency with which each word 

hfliflaffgfiiaaa 
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was chosen by the total sample of Ss. It now appears that a large portion 
of this correlation can be explained by corpus size per se * 

In addition It may be noted that and k 2 differ vastly In their 
relationship to p2 ( ** rk^/rk 2 ). The reason for this Is that whereas k^ 

Is highly related to rk^ ( r ■ .89, not shown In Table 2), kj is not 
related to rk^ ( r ■ .24, not shown). This simply means that In the case 
of N»vords the number of word types varies closely with the number of 
word tokens: not many N-word types attracted more than one respondent. 

In order to further explore the relationship of word corpus character- 
istics to p, the probability of successfully dozing the deletion, two 
regression equations were computed. First, all variables were Included In 
the calculations. The resulting equation was: 
p - .82198 - .02412k + .02458p lf 
where p ■ proportion of subjects correctly filling In a blank 

k ■ the size of the distribution of word types at the point of 
that tiank and 

p^ ■ the ratio of N and S word types. 

It may be noted that the direction of the regression coefficients Is in 
the anticipated directions. A Multiple R of .747 la associated with this 
equation. The standard error of estimate Is .85. The Inclusion of the 
first variable accounts for 50 per cent of the variance; the second 
variable adds 6 per cent. The F-ratio associated with the proportion 
accounted for by regression equals 30.33 (df 11 4,48) After Inclusion of 
these two vatlables, no other variables possessed significant partial 
correlations with the criterion. Parenthetically, it may be noted that 
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the simple correlation between p^ and p was only ,04. However, the 
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correlation between these variables with tha effect of corpus size (k) 
partlalled out Increased to .34. 

In terms of the theoretical Issue underlying this study an Interesting 
question remains unanswered by the above results. Apparently success 
probability (p) Is highly related to corpus size. However, given a corpus 
of a specific size, what characteristics of the corpus do determine p? A 
complete answer to this question would indeed shed a great amount of light 
on what processes occur when a subject Is faced with a specific blank for 
which (as Is theoretically always the case) the corpus from which he 
selects Is fixed In size. 

Presumably, this question can best be researched by studying the variables 
mentioned above, quite possibly in conjunction with other variables, for a 

number of blanks with Identical corpus sizes. To generate such data la 

rather difficult and costly. Out of curiosity the authors reanalyzed 
their data removing the size variable k from the system of correlations. 
Again using stepwise regression the following equation resulted: 
p - .77250 - J0143k 1 + .06439p 2 - 1.56616p 4 
where p * proportion of subjects correctly filling In a blank 
k^ * size of the distribution of N words 

P 2 ■ the ratio of the H and S word tokens 

p 4 ■ the ratio of S word types and word tokens 

The resulting multiple R equals .719 (Std. error of est. - .196). The 
variables were entered in the aquation in the following order: p^, kp p 2 

accounting for respectively 42, 5, and 5 per cent of the variance. (F “ 
16.79 with 3 and 47 df.) The total percentage accounted for (52%) corpares 
not too unfavorably with that of the first equation (56%). 



9 



Discussion 

The production of c Loze responses is a highly complex activity, most 
likely involving utilization of a great many syntactic and semantic clues 
alternately at the level of conscious consideration of alternatives and 
automatically acting* 

One difficulty in interpreting the present findings in terms of their 
explanatory contribution is the concept of the corpus of words emitted as 
an approximation to an hypothetical corpus of words being searched for the 
correct response. Much uncertainty exists as to the conditions under which 
the overt corpus may be taken to be representative of its postulated covert 
counterpart. Are, for instance, too atringent time constraints when per- 
forming the task related to distortion of representativeness of the sampled 
corpus? Questions such as this need eventually to be answered in order to 
achieve a satisfactory description of the process of closure. 

At this point it seems fair to say that the corollary derived from a 
general search hypothesis seems sustained. Not only Is distribution related 
to p (as would be expected on the basis of any kind of search hypothesis), 
also relations pertinent to the ratio of N words (01 decision reducing 
elements) and S words (or decision increasing elements) is relevant. This 
seems to be borne out by both regression solutions. As mentioned above, the 
equations obtained admit both of a systematic and a heuristic search procedure. 
Only nonsignificant regression coefficient of all variables related to the 
N/S ratio could be taken as evidence, however weak, of absence of systematic 
search. Presumably, It makea only sense to speak of decision reducing or 
decision Increasing elements if some kind of comparison of word tokens prior 
to emitting a response takes place. In the light of the evidence, it seems 
ERIC unlikely that at no point in the response formation period such systematic 
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comparison Is engaged In by the responding organism. This Is a contra- 
indication of a completely heuristic procedure. 

For the moment* it appears that experimentation is needed to reveaL 
the extent or nature of organization In the search process. At this point 
it seems not unlike Ly that: the search process can be characterised as 
systematic In part. That is* whlLe a part of the time spent in searching for 
the corrext solution may bo used for heuristic searching, this does not 
necessarily exclude theoptlon at a particular moment in the search process 
to revert to a much more careful* deliberate and systematic analysis of the 
various choices available. 







Table 1 

Description of Variables 



Variable 



Description 



k 

k i 

k 2 

p i 

p 2 

P3 

p 4 

P 



number of total word types emitted 

number of CotaL word types of class N, emitted 

number of totat word types >r class S, emitted 

ratio of H and S word types 

ratio of K and S word tokens 

ratio of M word types and N word ^ktns 

ratio of S word types and S word tokens 

proportion of subjects correctly falling in a 
given blank 
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Table 2 

Matrix of Correlations among ALL Variables* 





CD 


(2) 


(3) 


<*> 


(5) 


(6) 


(7) 


(8) 




k 


k l 


k 2 


pi 


P2 


P3 


>4 


P 


<l) 


1.00 


.71 


.54 


.28 


.38 


OO 

0 

1 


.78 


-.71 


(2) 




1.00 


-.19 


.73 


.61 


-.38 


.31 


-.41 


(3) 






1.00 


1 

i- 

00 


-.18 


.36 


.74 


-.48 


(4) 








1.00 


.45 


-.27 


-.14 


-3- 

O 

a 


(5) 










1.00 


-.36 


.49 


-.26 


(6) 












1.00 


.00 


.L0 


(7) 














1.00 


-.65 



* for any r ^ .25, p .05, 
for any r ^ .36, p $ .01 
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